Introduction

Row {data-width = 600}

Introduction

The data used in the report was obtained from VicRoads, and contains comprehensive information on approximately 200,000 accidents that occurred in the state from 2006 to 2020.

Purpose of this analysis report:

  • To explore the trends and patterns in Victoria’s road safety data

  • Identifying which factors are associated with a higher risk of accident or death

  • And seeking to explain those relationships

Research questions:

  • Part 1:

    • The impact of temporal factors, such as year, weekday and hour, on the number of accidents.
  • Part 2:

    • The relationship of speed and the age of vehicles with the death rate from accidents.
  • Part 3:

    • Explores the effect of age and gender on accident numbers, as well as which roads in Victoria are most accident-prone and most deadly.

TEAM MEMBERS

Name Email Address Student Id
Kaihao Chen 27439992
Arek Chouzadjian 28644182
Ibrahim Al-Hindi 24112488

Part 1

Row

Accidents per year

Accidents by weekday

Row

Accidents by hour

Deaths by hour

Part 2

Row

Deaths by speed zone

Row

Death rate by year of vehicle manufacture

Regression model

Goodness of fit

term estimate std.error statistic p.value
(Intercept) 0.4420226 0.0627854 7.040211 0e+00
vehicle_year_manuf -0.0002169 0.0000314 -6.918470 1e-07
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.5846833 0.5724681 0.0019542 47.86523 1e-07 1 174.5064 -343.0127 -338.2622 0.0001298 34 36

Part 3

Row

Accidents Map

Roads with Most Accidents and Highest Death Rates

Accidents by road
Road Accidents
PRINCES HIGHWAY 3581
HIGH STREET 3096
NEPEAN HIGHWAY 2376
SPRINGVALE ROAD 1663
SOUTH GIPPSLAND HIGHWAY 1538
SYDNEY ROAD 1538
MONASH FREEWAY 1533
MAROONDAH HIGHWAY 1335
Deadliest roads
Road Accidents Deaths Deaths_per_accident
GLENELG HIGHWAY 231 28 0.1212121
GOULBURN VALLEY HIGHWAY 344 39 0.1133721
WIMMERA HIGHWAY 156 17 0.1089744
MURRAY VALLEY HIGHWAY 727 76 0.1045392
STRZELECKI HIGHWAY 106 11 0.1037736
HAMILTON HIGHWAY 223 21 0.0941704
MELBA HIGHWAY 211 18 0.0853081
NORTHERN HIGHWAY 300 25 0.0833333

Accidents by Gender

Death Rate by User Type

---
title: "Motor Vehicle Accidents in Victoria"
output: 
  flexdashboard::flex_dashboard:
        vertical_layout: scroll
        orientation: rows
        source_code: embed
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
                      eval = TRUE,
                      message = FALSE, 
                      warning = FALSE)
```

```{r libraries}
library(flexdashboard)
library(tidyverse)
library(lubridate)
library(janitor)
library(plotly)
library(ggResidpanel)
library(broom)
library(knitr)
library(kableExtra)
library(ggmap)
library(ggthemes)
```

```{r load-data}
accidents <- read_csv("data/ACCIDENT.csv") %>%
  clean_names()

locations <- read_csv("data/ACCIDENT_LOCATION.csv") %>%
  clean_names()

nodes <- read_csv("data/NODE.csv") %>%
  clean_names()

persons <- read_csv("data/PERSON.csv") %>%
  clean_names()

vehicles <- read_csv("data/VEHICLE.csv") %>%
  clean_names()
```

```{r create-year-hour-day}
accidents <- accidents %>%
  mutate(accidentdate = dmy(accidentdate),
         Year = year(accidentdate),
         Hour = hour(accidenttime),
         Weekday = wday(accidentdate, 
                        label = TRUE,
                        abbr = FALSE))
```

Introduction {data-icon="fa-address-book"}
===================================== 

Row {data-width = 600}
-----
### **Introduction**

The data used in the report was obtained from VicRoads, and contains comprehensive information on approximately 200,000 accidents that occurred in the state from 2006 to 2020. 


**Purpose of this analysis report**:
  
* To explore the trends and patterns in Victoria’s road safety data

* Identifying which factors are associated with a higher risk of accident or death

* And seeking to explain those relationships


**Research questions**:

* Part 1: 
  
  - The impact of temporal factors, such as year, weekday and hour, on the number of accidents.

* Part 2: 

  - The relationship of speed and the age of vehicles with the death rate from accidents.

* Part 3: 

  - Explores the effect of age and gender on accident numbers, as well as which roads in Victoria are most accident-prone and most deadly.

***

**TEAM MEMBERS**

|Name	            |Email Address              |Student Id|
|:---------------:|:-------------------------:|:--------:|
|Kaihao Chen      |kche154@student.monash.edu | 27439992 |
|Arek Chouzadjian |acho0007@student.monash.edu| 28644182 |
|Ibrahim Al-Hindi |imalh2@student.monash.edu  | 24112488 |
 
***

Part 1 {data-icon="fa-battery-quarter"}
===================================== 

Row {data-width=600}
--------------------------------
### Accidents per year

```{r chen1}
accidents_per_year <- accidents %>%
  count(Year) %>% 
  ggplot(aes(x = Year, 
             y = n)) +
  geom_line() +
  xlab("Year(2006 ~ 2020)") +
  ylab("Number of Car Accidents") +
  geom_point() 

ggplotly( accidents_per_year)
```

### Accidents by weekday

```{r chen2}
accidents %>%
  count(Weekday,
        name = "Accidents") %>% 
  ggplot(aes(x = Weekday, 
             y = Accidents)) +
  geom_bar(stat = "identity", 
           fill = "#999999") +
  ylab("Number of Car Accidents") +
  geom_text(aes(label = Accidents), 
            vjust = -1, 
            color = "black", 
            size = 3)
```


Row {data-width=600}
--------------------------------
### Accidents by hour

```{r chen3}
accidents_by_hour <- accidents %>%
  count(Hour,
        name = "Accidents") 

p3 <- accidents_by_hour %>%
  ggplot(aes(x = Hour, 
             y = Accidents)) + 
  geom_line() +
  xlab("Time") +
  ylab("Number of Car Accidents") +
  geom_point()

ggplotly(p3)
```


### Deaths by hour

```{r deaths-by-hour}
deaths_by_hour <- accidents %>%
  group_by(Hour) %>%
  tally(no_persons_killed,
        name = "Deaths")
```

```{r deaths-per-accident-by-hour}
deaths_per_accident_by_hour <- accidents_by_hour %>%
  left_join(deaths_by_hour) %>%
  mutate(Deaths_per_accident = round(Deaths/Accidents, digit = 4))
```

```{r chen4}
p4 <- deaths_per_accident_by_hour %>%
  ggplot(aes(x = Hour, 
             y = Deaths_per_accident)) +
  geom_line() +
  xlab("Time") +
  ylab("Death Rate of Car Accidents") +
  geom_point()  
ggplotly(p4)
```

Part 2 {data-icon="fa-battery-half"}
===================================== 

Row {.tabset data-height=500}
------------

### **Deaths by speed zone**

```{r accidents-by-speed-zone}
accidents_by_speed_zone <- accidents %>%
  count(speed_zone,
        name = "Accidents")
```

```{r deaths-by-speed-zone}
deaths_by_speed_zone <- accidents %>%
  group_by(speed_zone) %>%  
  tally(no_persons_killed,
        name = "Deaths")
```

```{r deaths-per-accident-by-speed-zone}
deaths_by_accident <- accidents_by_speed_zone %>%
  left_join(deaths_by_speed_zone) %>%
  mutate(Deaths_by_accident = Deaths/Accidents)
```

```{r deaths-per-accident-plot}
deaths_by_accident %>%
  mutate(speed_zone = as.numeric(speed_zone)) %>%
  filter(speed_zone %in% seq(30, 110, 10)) %>% 

  ggplot(aes(y = Deaths_by_accident, 
             x = speed_zone)) + 
  geom_line() +
  labs(x = "Speed Zone",
       y = "Deaths by Accident")

ggplotly()
```

Row {.tabset data-height=500}
------------
### **Death rate by year of vehicle manufacture**

```{r join-person-and-vehicle}
person_vehicle <- persons %>%
  left_join(vehicles)
```

```{r total-people-involved-in-accidents-per-manufature-year}
person_vehicle_total <- person_vehicle %>%
  group_by(vehicle_year_manuf) %>%
  tally(name = "Persons",
        sort = TRUE)

person_vehicle_deaths <- person_vehicle %>%
  filter(inj_level_desc == "Fatality") %>%
  group_by(vehicle_year_manuf) %>%
  tally(name = "Fatalities",
        sort = TRUE)
```

```{r join-total-and-deaths}
death_rate_by_year_manuf <- person_vehicle_total %>%
  left_join(person_vehicle_deaths) %>%
  mutate(death_rate = Fatalities/Persons) %>%
  arrange(desc(vehicle_year_manuf))
```

```{r plot-death-rate-by-year-manuf}
manuf_year_death_rate <- death_rate_by_year_manuf %>%
  filter(vehicle_year_manuf >= 1985 & vehicle_year_manuf < 3001)

p1 <- manuf_year_death_rate %>%
  ggplot(aes(x = vehicle_year_manuf, 
             y = death_rate)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Year Manufactured",
       y = "Death Rate")

ggplotly(p1)
```

### **Regression model**

```{r regression-model}
manuf_year_death_rate_lm <- lm(death_rate ~ vehicle_year_manuf, data = manuf_year_death_rate)

resid_panel(manuf_year_death_rate_lm, plot = "all")

```

### **Goodness of fit**

```{r}
tidy(manuf_year_death_rate_lm) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = "striped")

glance(manuf_year_death_rate_lm) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = "striped")
```


Column {.sidebar data-width=350}
----

***

> **Findings**

1. Accidents become increasingly serious the faster the speed at which they occur. The risk of dying in an accident is nearly 13 times higher in a 110km/h zone (0.064 deaths/accident) than in a 40km/h zone (0.005 deaths/accident). 


2. A person's risk of dying in an accident is positively correlated with the age of their vehicle; the newer the make, the less likely it is that a person will be killed in an accident. For every year older a vehicle is, the risk of dying if you have an accident in it increases by 0.0002 deaths per accident. The reason for this is improved safety standards for vehicles. 

***

Part 3 {data-icon="fa-battery-three-quarters"}
===================================== 

Row {.tabset data-height=800}
---------

### **Accidents Map**

```{r accidents-map}
location_box <- c(min(nodes$long), 
                  min(nodes$lat), 
                  max(nodes$long),
                  max(nodes$lat))

location_map <- get_map(location = location_box, 
                        source = 'osm')

  ggmap(location_map) + 
    geom_point(data = nodes,
               aes(x = long,
                   y = lat), 
               colour="#0072B2", 
               alpha=0.5, 
               size = 0.05) +
    theme_map()
```

### **Roads with Most Accidents and Highest Death Rates**

```{r join-road-names}
roads_accidents <- accidents %>%
  left_join(locations) %>% 
  mutate(Road = paste(road_name,
                      road_type))
```

```{r accidents-by-road}
accidents_per_road <- roads_accidents %>%
  count(Road, 
        name = "Accidents"
        ,sort = T)

deaths_per_road <- roads_accidents %>%
  group_by(Road) %>%
  tally(no_persons_killed,
        name = "Deaths",
        sort = TRUE)

accidents_per_road %>% 
  head(8) %>% 
  kable(caption = "Accidents by road") %>% 
  kable_styling(bootstrap_options = "striped")
```

```{r deadliest-road}
deadliest_road <- accidents_per_road %>%
  left_join(deaths_per_road) %>%
  filter(Accidents >= 100) %>%
  mutate(Deaths_per_accident = Deaths/Accidents) %>%
  arrange(desc(Deaths_per_accident))

deadliest_road %>% 
  head(8) %>% 
  kable(caption = "Deadliest roads") %>% 
  kable_styling(bootstrap_options = "striped")
```

### **Accidents by Gender**

```{r accidents-gender-table}
persons <- persons %>%
  mutate(sex = recode(sex,
                      "F" = "Female",
                      "M" = "Male",
                      "U" = "Unknown"))
```

```{r accidents-gender-plot}
persons %>% 
  filter(road_user_type_desc == "Drivers",
         sex %in% c("Female",
                    "Male"),
         age > 15) %>% 
  count(age,
        sex,
        name = "Accidents") %>% 
  
  ggplot(aes(x = age,
             y = Accidents, 
             color = sex)) +
  geom_line() +
  xlab("Age")
  
ggplotly()
```

### **Death Rate by User Type**

```{r user-death-rate}
user_type <- persons %>% 
  count(road_user_type_desc, 
        name = "User")

user_type_fatal <- persons %>% 
  filter(inj_level_desc == "Fatality") %>% 
  count(road_user_type_desc, 
        name = "Fatal")

user_type %>% 
  left_join(user_type_fatal) %>% 
  mutate(death_rate = Fatal / User) %>% 
  
  ggplot(aes(x = fct_reorder(road_user_type_desc, 
                             death_rate), 
             y = death_rate)) +
  geom_col(fill = "#999999") +
  labs(x = "User Type",
       y = "Death Rate") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1))
  
```

Column {.sidebar data-width=350}
----

***

> **Findings**

1. The accidents are concentrated around Melbourne and branch out as we get further away. Small concentrations are also present around other cities and towns

2. Regional highways are the deadliest types of roads

3. Males commit more accidents than females. Both commit the highest number of accidents as new drivers at a young age, the accidents steadily decrease as the age increases

4. Pedestrians are the biggest casualty of accidents with the highest number of fatalities. Bicyclists are the least at risk of death

***